Add remote symbolication support with build-id and PC offset #324

jbachorik · 2026-01-07T12:34:06Z

What does this PR do?:
Adds remote symbolication support to the Java profiler, enabling native frames to be symbolicated remotely by backend services instead of locally by the agent.

Motivation:

Reduce agent overhead by deferring symbol resolution to backend services
Support stripped binaries and scenarios where debug symbols aren't available locally
Enable centralized symbol management and caching
Improve scalability for distributed profiling

How it works:
When remote symbolication is enabled (remotesym=true), the profiler stores build-ID and PC offset information for native frames instead of resolving symbols locally. This raw addressing information is serialized into JFR format and sent to backend services for symbol resolution.

Key Components:

Build-ID Extraction (symbols_linux_dd.h/cpp)
- Extracts GNU build-id from ELF binaries on Linux
- O(1) caching via _build_id_processed set to prevent redundant extraction
- Validates ELF structure with bounds and alignment checks
Packed Frame Data (profiler.h/cpp, vmEntry.h)
- 64-bit jmethodID encoding: pc_offset (44 bits) | mark (3 bits) | lib_index (17 bits)
- RemoteFramePacker utility for packing/unpacking
- Zero memory overhead - data packed directly into existing jmethodID field
- Signal-handler safe (no allocations in hot paths)
Stack Walker Integration (stackWalker_dd.h, gradle/patching.gradle)
- Integrated at native frame resolution point in all stack walking modes
- walkFP/walkDwarf: convertNativeTrace() → populateRemoteFrame()
- walkVM/walkVMX: resolveNativeFrameForWalkVM() called during stack walk
- Dynamic BCI selection (BCI_NATIVE_FRAME vs BCI_NATIVE_FRAME_REMOTE)
JFR Serialization (flightRecorder.cpp/h)
- Unpacks remote frame data during serialization
- Class name: build-id hex string
- Method name: <remote> marker
- Signature: PC offset as hex (0x<offset>)
- Frame type: FRAME_NATIVE_REMOTE (7)
Library Management (libraries.h/cpp, codeCache.h/cpp)
- Automatic build-ID extraction on profiler start
- Build-ID storage in CodeCache (one-time per library)
- Library index lookup for JFR unpacking (getLibraryByIndex)

Configuration:

# Enable remote symbolication
java -agentpath:libjavaProfiler.so=start,cpu,remotesym=true,file=profile.jfr MyApp

Observability:
Three new counters track feature usage:

REMOTE_SYMBOLICATION_FRAMES: Frames using remote symbolication
REMOTE_SYMBOLICATION_LIBS_WITH_BUILD_ID: Libraries with extracted build-IDs
REMOTE_SYMBOLICATION_BUILD_ID_CACHE_HITS: Build-ID cache efficiency

Thread Safety:

Build-ID extraction: Protected by _build_id_lock mutex
Build-ID cache: Lock-free O(1) duplicate detection
JFR serialization: Library array stable during lockAll() hold

Performance:

Identical hot-path performance to traditional symbolication
Same O(log n) binarySearch for mark checking
Zero allocation overhead in signal handlers
Reduced memory: 8 bytes per frame vs full symbol strings

Platform Support:

Linux: Full support with ELF build-id extraction
macOS/Windows: Framework ready, needs platform-specific extraction

Testing:

./gradlew testDebug
./gradlew :ddprof-lib:gtest:gtestDebug
./gradlew :ddprof-test:test --tests RemoteSymbolicationTest

Test Coverage:

C++ unit tests: remotesymbolication_ut.cpp, remoteargs_ut.cpp
Integration tests: RemoteSymbolicationTest.java (all cstack modes: vm, vmx, fp, dwarf)
Native test library: libddproftest.so with guaranteed build-id

Documentation:

doc/RemoteSymbolication.md: Complete architecture and implementation details
doc/ModifierAllocation.md: Frame type allocation

Backward Compatibility:

Default behavior unchanged (remote symbolication disabled by default)
Graceful fallback to local symbolication when build-ID unavailable
Mixed traces support both local and remote frames

For Datadog employees:

If this PR touches code that signs or publishes builds or packages, or handles
credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
This PR doesn't touch any of that.
JIRA: PROF-12279

pr-commenter · 2026-01-07T14:18:28Z

Benchmarks [x86_64 wall]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	wall	wall
wall	on	on

Summary

Found 0 performance improvements and 1 performance regressions! Performance is the same for 14 metrics, 23 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:akka-uct	worse [+0.409s; +1.843s] or [+1.503%; +6.771%]	unstable [-196.393MB; +377.081MB] or [-16.197%; +31.098%]

pr-commenter · 2026-01-07T14:18:54Z

Benchmarks [x86_64 cpu,wall,alloc,memleak]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	cpu,wall,alloc,memleak	cpu,wall,alloc,memleak
wall	on	on

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 23 unstable metrics.

pr-commenter · 2026-01-07T14:19:11Z

Benchmarks [x86_64 memleak,alloc]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	memleak,alloc	memleak,alloc
wall	off	off

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 16 metrics, 22 unstable metrics.

pr-commenter · 2026-01-07T14:19:18Z

Benchmarks [x86_64 alloc]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	alloc	alloc
wall	off	off

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 14 metrics, 24 unstable metrics.

pr-commenter · 2026-01-07T14:19:21Z

Benchmarks [x86_64 memleak]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	memleak	memleak
wall	off	off

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 23 unstable metrics.

pr-commenter · 2026-01-07T14:19:21Z

Benchmarks [x86_64 cpu]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	cpu	cpu
wall	off	off

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 23 unstable metrics.

pr-commenter · 2026-01-07T14:19:23Z

Benchmarks [x86_64 cpu,wall]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	cpu,wall	cpu,wall
wall	on	on

Summary

Found 1 performance improvements and 0 performance regressions! Performance is the same for 14 metrics, 23 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:chi-square	better [-1.721s; -0.315s] or [-10.078%; -1.843%]	unstable [-361.052MB; +460.826MB] or [-32.822%; +41.893%]

pr-commenter · 2026-01-07T14:19:41Z

Benchmarks [aarch64 cpu]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	cpu	cpu
wall	off	off

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 16 metrics, 22 unstable metrics.

pr-commenter · 2026-01-07T14:19:46Z

Benchmarks [aarch64 memleak,alloc]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	memleak,alloc	memleak,alloc
wall	off	off

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 17 metrics, 21 unstable metrics.

pr-commenter · 2026-01-07T14:20:09Z

Benchmarks [aarch64 alloc]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	alloc	alloc
wall	off	off

Summary

Found 1 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 22 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:scala-doku	better [-3.250s; -0.606s] or [-10.669%; -1.990%]	unstable [-196.742MB; +269.241MB] or [-18.418%; +25.205%]

pr-commenter · 2026-01-07T14:21:25Z

Benchmarks [aarch64 wall]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	wall	wall
wall	on	on

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 17 metrics, 21 unstable metrics.

pr-commenter · 2026-01-07T14:22:00Z

Benchmarks [aarch64 cpu,wall]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	off	off
modes	cpu,wall	cpu,wall
wall	on	on

Summary

Found 2 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 21 unstable metrics.

scenario	Δ mean execution_time	Δ mean rss
scenario:renaissance:scala-doku	better [-3.195s; -0.585s] or [-10.497%; -1.924%]	unstable [-196.499MB; +269.992MB] or [-18.381%; +25.256%]
scenario:renaissance:par-mnemonics	better [-2.971s; -0.813s] or [-12.277%; -3.359%]	unstable [-251.815MB; +344.276MB] or [-24.226%; +33.122%]

pr-commenter · 2026-01-07T14:22:15Z

Benchmarks [aarch64 cpu,wall,alloc,memleak]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	on	on
cpu	on	on
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	cpu,wall,alloc,memleak	cpu,wall,alloc,memleak
wall	on	on

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 16 metrics, 22 unstable metrics.

pr-commenter · 2026-01-07T14:22:17Z

Benchmarks [aarch64 memleak]

Parameters

	Baseline	Candidate
config	baseline	candidate
ddprof	1.34.4	1.35.0-air_enable-remote-symbolication-for-java-profiler-d872d3bd-2-SNAPSHOT

See matching parameters

	Baseline	Candidate
alloc	off	off
cpu	off	off
iterations	5	5
java	"11.0.28"	"11.0.28"
memleak	on	on
modes	memleak	memleak
wall	off	off

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 17 metrics, 21 unstable metrics.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

VMX and VM stack walkers were bypassing remote symbolication by directly returning resolved symbol names. This caused frames to show as 'burn_cpu_recursive' instead of '<build-id>.<remote>(0x<offset>)' format. Extracted resolveNativeFrame() as shared function and added applyRemoteSymbolicationToVMFrames() to post-process VM walker output, converting resolved symbols back to RemoteFrameInfo structures when libraries have build-ids. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Replaced malloc() calls with pre-allocated pool to ensure signal handler safety and eliminate memory leaks. Pool uses atomic operations for lock-free allocation across 16 lock-strips (128 entries each, ~48KB total). Also fixed documentation inaccuracies regarding file names, usage examples, and JFR output format based on PR review feedback. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Add resolveNativeFrameForWalkVM helper to profiler.h/cpp - Patch walkVM to use remote symbolication at native frame resolution point - Remove broken applyRemoteSymbolicationToVMFrames function - Add lock_index parameter to all walkVM signatures via patching.gradle - Update stackWalker_dd.h wrappers to pass lock_index - Remove dead non-const operator[] from codeCache.h - Add alignment check for ELF program headers in symbols_linux_dd.cpp 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Document resolveNativeFrame() and resolveNativeFrameForWalkVM() helpers - Add section on upstream stack walker integration via patching.gradle - Update Memory Management section with pre-allocated pool details - Add ELF security details (bounds/alignment checks) - Document walkVM integration at native frame resolution point - Remove LinearAllocator from future enhancements (already using pre-allocated pool) - Update file structure to include all modified files - Clarify stack walker integration architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The RemoteFrameInfo pool counters were being reset on every profiler start, which caused issues when reset=false (profiler restart without clearing traces). Standby traces could still reference RemoteFrameInfo objects, so resetting the pool unconditionally would corrupt those traces. This fix ensures pool counters are reset only when traces are cleared (reset=true), maintaining synchronization between trace storage and pool state: - When reset=true: traces cleared AND pool counters reset - When reset=false: traces kept AND pool counters kept Without this fix: - First test run: works (pool starts at 0) - Subsequent test runs: pool exhausted → fallback to resolved symbols - Production profilers: would fail after restart Changes: - Reset pool counters inside _call_trace_storage.clear() under lock - Initialize counter to 0 only on first-time pool allocation - Update comment to clarify counter is reset when traces are cleared 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The previous fix (c90644d) only reset the pool when reset=true || _start_time==0. This caused pool exhaustion in parameterized tests where the profiler singleton is reused across multiple test iterations: - Test 1 (vm mode): _start_time==0, pool resets → PASS - Test 2 (vmx mode): _start_time!=0, reset=false → pool NOT reset → FAIL - Test 3+ (fp, dwarf): Same failure Root cause: Pool reset was conditional, but profiler.stop() guarantees ALL standby traces are flushed before start() is called again. Therefore, no traces can reference old RemoteFrameInfo objects, making it safe to ALWAYS reset the pool on start(). This fix adds an unconditional pool reset outside the if(reset || _start_time==0) block, ensuring the pool starts fresh for each profiling session. Also removed debug TEST_LOG statements that were added during investigation. 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit fixes two issues that prevented remote symbolication from working correctly with the test library libddproftest: 1. Late library loading: The test library loads via JNI during test execution, after profiling starts. Enable dlopen hook for remote symbolication mode to extract build-IDs immediately when libraries are dynamically loaded. 2. JMC API frame format mismatch: Remote frames store build-ID in type.name (bare, no suffix) and "<remote>" in method.name, but the test expected "build-id.<remote>" format. Update test to correctly parse the actual JFR frame structure. Changes: - profiler.cpp: Add updateBuildIds() call in dlopen_hook when remote symbolication is enabled, enable switchLibraryTrap for remote symbolication mode, move unpatch_libraries() after JFR serialization - profiler.h: Increase REMOTE_FRAME_POOL_SIZE to 1024 entries per strip - libraries.cpp: Remove debug logging - RemoteSymbolicationTest.java: Fix frame detection logic to check methodName.equals("<remote>") AND className.equals(testLibBuildId), refactor to use IMCStackTrace.getFrames() API, clean up debug output Test results: All cstack modes (vm, vmx, fp, dwarf) now pass with remote symbolication frames correctly detected. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Refactors updateBuildIds() to use explicit per-library caching, mirroring the _parsed_inodes pattern from symbols_linux.cpp. This provides O(1) amortized performance on subsequent dlopen calls instead of O(N) iteration. Implementation: - Move Linux build-ID extraction to libraries_linux.cpp with static cache - Add libraries_macos.cpp with no-op stub for non-Linux platforms - Track processed libraries via std::unordered_set<const CodeCache*> - Thread-safe with mutex protection Performance: - Before: O(N) iteration per dlopen (100 libs × hasBuildId check) - After: O(N) hash lookups = O(1) amortized (~50-100x improvement) - Memory cost: 8 bytes per library (~800 bytes for 100 libraries) Follows project pattern of platform-specific *_linux.cpp files instead of #ifdef guards in main implementation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The VM and VMX stack walk modes were failing because they attempted to call NativeFunc::mark() on packed remote symbolication data. When frame_bci == BCI_NATIVE_FRAME_REMOTE, the method_id field contains a packed integer (lib_index in upper 20 bits, pc_offset in lower 44 bits), not a string pointer. NativeFunc::mark() does pointer arithmetic assuming the value is a string with metadata stored before it, causing undefined behavior when given a packed integer. Changes: - Add upstream patch to guard NativeFunc::mark() check with frame_bci != BCI_NATIVE_FRAME_REMOTE in stackWalker.cpp - Implement packed data encoding in populateRemoteFrame() and resolveNativeFrameForWalkVM() - Implement unpacking logic in flightRecorder.cpp - Add getLibraryByIndex() helper to libraries.h - Remove ExtendedCallFrame approach in favor of packing into existing ASGCT_CallFrame structure All stack walk modes (vm, vmx, fp, dwarf) now pass tests with remote symbolication enabled. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit simplifies the remote symbolication implementation by: 1. Added RemoteFramePacker utility struct for clean pack/unpack of data - Packs pc_offset (44 bits), mark (3 bits), lib_index (17 bits) - Provides clear API for packing/unpacking remote frame data 2. Removed marked ranges infrastructure for simplicity - Deleted MarkedRange struct and related methods from CodeCache - Removed isMarkedAddress(), sortMarkedRanges(), addMarkedRange() - Reverted to simpler binarySearch + NativeFunc::mark approach - Same O(log n) performance but less code complexity 3. Optimized to eliminate duplicate symbol resolution - populateRemoteFrame() now accepts mark parameter - Avoids duplicate binarySearch() calls (50% reduction) 4. Improved documentation accuracy - Clarified that we defer full symbol resolution, not mark checking - Documented actual performance characteristics - Fixed misleading comments about eliminating symbol resolution The key insight: Since we need binarySearch() to get symbol names for mark checking anyway, there's no performance benefit to maintaining a separate marked ranges index. The simpler approach has the same O(log n) cost but significantly less code complexity. All tests pass including RemoteSymbolicationTest covering all 4 stack walk modes (vm, vmx, fp, dwarf). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit addresses code review findings and adds observability improvements: 1. **Fix type mismatch in getLibraryByIndex** - Changed parameter from uint16_t to uint32_t - Matches lib_index packing width (17 bits = max 131K libraries) - Prevents theoretical overflow for large library counts 2. **Clarify library array stability during serialization** - Added comment explaining lockAll() ensures array stability - Corrected initial misunderstanding about race conditions - No actual race exists - serialization happens with profiling locked 3. **Add metrics for remote symbolication** - REMOTE_SYMBOLICATION_FRAMES: Track frames using remote symbolication - REMOTE_SYMBOLICATION_LIBS_WITH_BUILD_ID: Libraries with extracted build-IDs - REMOTE_SYMBOLICATION_BUILD_ID_CACHE_HITS: Cache efficiency tracking - Enables production monitoring of feature adoption and performance All changes verified with RemoteSymbolicationTest passing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Updated documentation to accurately describe the packed data approach and recent improvements: 1. **Packed Data Architecture** - Documented RemoteFramePacker utility struct - Explained 64-bit packing: pc_offset (44) | mark (3) | lib_index (17) - Clarified zero memory overhead vs previous pool approach 2. **JFR Serialization Details** - Corrected output format (no parentheses in signature) - Documented unpacking flow with RemoteFramePacker - Explained thread safety with lockAll() held 3. **Performance Characteristics** - Updated with actual performance (identical to traditional) - Documented duplicate lookup elimination optimization - Clarified O(1) build-ID cache efficiency 4. **Observability Metrics** - Added new section documenting 3 tracking counters - REMOTE_SYMBOLICATION_FRAMES: Feature usage - REMOTE_SYMBOLICATION_LIBS_WITH_BUILD_ID: Coverage - REMOTE_SYMBOLICATION_BUILD_ID_CACHE_HITS: Cache efficiency 5. **Implementation Notes** - Expanded thread safety section with lockAll() guarantees - Clarified signal handler safety with packed representation - Added "Design Evolution" explaining simplification 6. **Library Integration** - Documented getLibraryByIndex() with uint32_t parameter - Explained O(1) cache lookup in updateBuildIds() Documentation now accurately reflects the production-ready implementation with packed data, simplified mark checking, and comprehensive metrics. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Consolidates all documentation under the doc/ directory instead of having a separate docs/ directory. This matches the existing pattern where documentation files are in doc/. - Renamed docs/architecture/CallTraceStorage.md to doc/architecture/CallTraceStorage.md - Renamed docs/architecture/TLSContext.md to doc/architecture/TLSContext.md - Renamed docs/architecture/TlsPriming.md to doc/architecture/TlsPriming.md - Removed empty docs/ directory Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Updated references in ddprof-stresstest/README.md to point to the new doc/architecture/ location instead of doc/ directly. - Changed doc/CallTraceStorage.md to doc/architecture/CallTraceStorage.md This completes the consolidation of all documentation under the doc/ directory with proper subdirectory organization. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Rename all documentation files under doc/ to use consistent PascalCase naming convention for improved consistency across the codebase. Changes: - event-type-system.md → EventTypeSystem.md - MODIFIER_ALLOCATION.md → ModifierAllocation.md - profiler-memory-requirements.md → ProfilerMemoryRequirements.md - REMOTE_SYMBOLICATION.md → RemoteSymbolication.md Update references: - README.md: Update link to RemoteSymbolication.md Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Update remotesym argument parsing to match the robust pattern used by other BOOL flags (like mcleanup): - Explicitly handle false values: 'n', 'no', 'f', 'false', '0' - Explicitly handle true values: 'y', 'yes', 't', 'true', '1' - Handle no-value case: 'remotesym' alone enables the feature - Any other value defaults to enabled (consistent with mcleanup) This prevents accepting invalid values like "remotesym=yikes" and makes the behavior consistent with other boolean flags in the codebase. Added comprehensive test coverage: - Test no-value case (should enable) - Test numeric values (0 and 1) - Test "no" and "n" variants (should disable) - Test invalid values (document default behavior) Addresses review comments: - #324 (comment) - #324 (comment) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Address security and robustness issues in ELF build-ID extraction: 1. Integer overflow vulnerability (symbols_linux_dd.cpp) - Replace addition-based bounds check with subtraction pattern - Prevents integer wrapping with malicious n_namesz/n_descsz values - Progressive checks: header size, name size, desc size independently - Uses safe pattern: remaining = note_size - offset - sizeof(header) - Each component verified against remaining before subtraction 2. Test race condition (remotesymbolication_ut.cpp) - Replace fixed path "/tmp/not_an_elf" with mkstemp() - Use unique temporary file: "/tmp/not_an_elf_XXXXXX" - Eliminates race conditions in concurrent test environments - Avoids conflicts when tests run in parallel - Follows best practices for temporary file creation Addresses review comments: - #324 (comment) - #324 (comment) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Use portable PRIxPTR macro instead of %lx for formatting uintptr_t values to ensure correct behavior across all platforms. Changes: - Added #include <inttypes.h> for PRIxPTR macro - Changed format string from "0x%lx" to "0x%" PRIxPTR - Ensures correct formatting on Windows x64 where uintptr_t is unsigned long long, not unsigned long - Maintains compatibility across all platforms (Linux, macOS, Windows) The %lx format specifier assumes uintptr_t == unsigned long, which is not portable. On Windows x64, uintptr_t is unsigned long long, which would cause format string warnings or incorrect output. Addresses review comment: - #324 (comment) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Root Cause Analysis: OpenJ9 uses JVMTI-based CPU profiling when shouldUseAsgct() returns false, falling back to the j9_engine (profiler.cpp:1148-1157). JVMTI's GetAllStackTracesExtended() API only captures Java stack frames and does not include native frames from dynamically loaded libraries. Test Requirement: RemoteSymbolicationTest requires native frames from libddproftest.so to validate remote symbolication. The test calls RemoteSymHelper.burnCpu() and computeFibonacci() which invoke native methods, expecting these frames to appear in samples with build-ID and PC offset information. Observed Failure: Test consistently failed after 10 retry attempts with: - 48-52 samples collected per attempt - 0 frames from libddproftest found - Log shows: [TEST::INFO] J9[cpu]=jvmti - Library loaded successfully with valid build-ID This is a fundamental limitation of JVMTI-based profiling on OpenJ9, not a bug in remote symbolication. The feature works correctly on HotSpot JVMs which use signal-based profiling (perf_events/itimer) that captures both Java and native frames. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- flightRecorder.cpp: Fix incorrect "2025, 2026" to "2026" - arguments.cpp: Add Datadog copyright for modifications Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…dd.cpp Added detailed comments with links to official specifications: - ELF Specification from Linux Foundation - LSB ELF Note Section format - GNU build-id feature documentation - GNU binutils ld --build-id option - readelf(1) manual page Also added inline documentation of ELF note structure (Elf64_Nhdr) with field-by-field explanation and 4-byte alignment requirements per spec. This makes it easier for future developers to understand the build-id extraction implementation and verify correctness against specifications. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

jbachorik added the AI label Jan 7, 2026

jbachorik force-pushed the air/enable-remote-symbolication-for-java-profiler-d872d3bd-2 branch 3 times, most recently from b41a6eb to 74c5410 Compare January 7, 2026 13:20

jbachorik force-pushed the air/enable-remote-symbolication-for-java-profiler-d872d3bd-2 branch 3 times, most recently from 1788096 to 55578ac Compare January 9, 2026 11:57

jbachorik requested a review from Copilot January 9, 2026 20:08

Copilot started reviewing on behalf of jbachorik January 9, 2026 20:08 View session

This comment was marked as outdated.

Sign in to view

jbachorik requested a review from Copilot January 9, 2026 20:51

Copilot started reviewing on behalf of jbachorik January 9, 2026 20:52 View session

This comment was marked as outdated.

Sign in to view

Copilot AI reviewed Jan 9, 2026

View reviewed changes

jbachorik requested a review from Copilot January 9, 2026 22:55

Copilot started reviewing on behalf of jbachorik January 9, 2026 22:56 View session

jbachorik and others added 3 commits January 13, 2026 17:19

This comment was marked as outdated.

Sign in to view

Fix the 'remotesym' arg name

45d0122

jbachorik force-pushed the air/enable-remote-symbolication-for-java-profiler-d872d3bd-2 branch from 5ee68d5 to 45d0122 Compare January 13, 2026 16:23

jbachorik requested a review from Copilot January 13, 2026 16:31

Copilot started reviewing on behalf of jbachorik January 13, 2026 16:32 View session

This comment was marked as outdated.

Sign in to view

jbachorik and others added 15 commits January 15, 2026 13:50

jbachorik marked this pull request as ready for review January 16, 2026 16:44

jbachorik requested a review from a team January 16, 2026 16:46

jbachorik and others added 2 commits January 16, 2026 16:49

Update copyrights to 2026 on modified files

527aba3

- flightRecorder.cpp: Fix incorrect "2025, 2026" to "2026" - arguments.cpp: Add Datadog copyright for modifications Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add remote symbolication support with build-id and PC offset #324

Are you sure you want to change the base?

Add remote symbolication support with build-id and PC offset #324

Uh oh!

Conversation

jbachorik commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [x86_64 wall]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [x86_64 cpu,wall,alloc,memleak]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [x86_64 memleak,alloc]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [x86_64 alloc]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [x86_64 memleak]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [x86_64 cpu]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [x86_64 cpu,wall]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [aarch64 cpu]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [aarch64 memleak,alloc]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [aarch64 alloc]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [aarch64 wall]

Parameters

Summary

Uh oh!

pr-commenter bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks [aarch64 cpu,wall]

Parameters

Summary

Uh oh!

jbachorik commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading

pr-commenter bot commented Jan 7, 2026 •

edited

Loading